RefSeq: an update on mammalian reference sequences

نویسندگان

  • Kim D. Pruitt
  • Garth R. Brown
  • Susan M. Hiatt
  • Françoise Thibaud-Nissen
  • Alex Astashyn
  • Olga D. Ermolaeva
  • Catherine M. Farrell
  • Jennifer Hart
  • Melissa J. Landrum
  • Kelly M. McGarvey
  • Michael R. Murphy
  • Nuala A. O'Leary
  • Shashikant Pujar
  • Bhanu Rajput
  • Sanjida H. Rangwala
  • Lillian D. Riddick
  • Andrei Shkeda
  • Hanzhen Sun
  • Pamela Tamez
  • Ray E. Tully
  • Craig Wallin
  • David Webb
  • Janet Weber
  • Wendy Wu
  • Michael DiCuccio
  • Paul A. Kitts
  • Donna R. Maglott
  • Terence D. Murphy
  • James Ostell
چکیده

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI's eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI's eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NCBI Reference Sequence Project: update and current status

The goal of the NCBI Reference Sequence (RefSeq) project is to provide the single best non-redundant and comprehensive collection of naturally occurring biological molecules, representing the central dogma. Nucleotide and protein sequences are explicitly linked on a residue-by-residue basis in this collection. Ideally all molecule types will be available for each well-studied organism, but the ...

متن کامل

Discrepancies between human DNA, mRNA and protein reference sequences and their relation to single nucleotide variants in the human population

The protein coding sequences of the human reference genome GRCh38, RefSeq mRNA and UniProt protein databases are sometimes inconsistent with each other, due to polymorphisms in the human population, but the overall landscape of the discordant sequences has not been clarified. In this study, we comprehensively listed the discordant bases and regions between the GRCh38, RefSeq and UniProt referen...

متن کامل

SeedSeq: Off-Target Transcriptome Database

Detection of potential cross-reaction between a short oligonucleotide sequence and a longer (unintended) sequence is crucial for many biological applications, such as high content screening (HCS), microarray nucleotide probes, or short interfering RNAs (siRNAs). However, owing to a tolerance for mismatches and gaps in base-pairing with target transcripts, siRNAs could have up to hundreds of pot...

متن کامل

RefSeq Frequently Asked Questions (FAQ)

The NCBI Reference Sequence (RefSeq) project provides sequence records and related information for numerous organisms, and provides a baseline for medical, functional, and comparative studies. Whereas the International Nucleotide Sequence Database Collaboration (INSDC, made up of GenBank, the European Nucleotide Archive, and the DNA Data Bank of Japan) represents an archival repository of all s...

متن کامل

Chapter 18 . The Reference Sequence ( RefSeq ) Database

NCBI’s Reference Sequence (RefSeq) database is a collection of taxonomically diverse, non-redundant and richly annotated sequences representing naturally occurring molecules of DNA, RNA, and protein. Included are sequences from plasmids, organelles, viruses, archaea, bacteria, and eukaryotes. Each RefSeq is constructed wholly from sequence data submitted to the International Nucleotide Sequence...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 42  شماره 

صفحات  -

تاریخ انتشار 2014